Install Python dependencies for Cloud Composer

Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3

This page describes how to install Python packages for your Cloud Composer environment.

About packages in Cloud Composer

This section explains how PyPI packages work in Cloud Composer.

Preinstalled and custom PyPI packages in Cloud Composer images

Cloud Composer images contains both preinstalled and custom PyPI packages.

  • Preinstalled PyPI packages are packages that are included in the Cloud Composer image of your environment. Each Cloud Composer image contains PyPI packages that are specific for your version of Cloud Composer and Airflow.

  • Custom PyPI packages are packages that you can install in your environment in addition to preinstalled packages.

Options to manage PyPI packages for Cloud Composer environments

Option Use if
Install from PyPI The default way to install packages in your environment
Install from a repository with a public IP address The package is hosted in a package repository other than PyPI. This repository has a public IP address
Install from an Artifact Registry repository The package is hosted in an Artifact Registry repository
Install from a repository in your project's network Your environment does not have access to public internet. The package is hosted in a package repository in your project's network.
Install as a local Python library The package cannot be found in PyPI, and the library does not have any external dependencies, such as dist-packages.
Install a plugin The package provides plugin-specific functionality, such as modifying the Airflow web interface.
PythonVirtualenvOperator You do not want the package to be installed for all Airflow workers, or the dependency conflicts with preinstalled packages. The package can be found in the PyPI and has no external dependencies.
KubernetesPodOperator and GKE operators You require external dependencies that cannot be installed from pip, such as dist-packages, or are on an internal pip server. This option requires more setup and maintenance. Consider it only if other options do not work.

Before you begin

  • You must have a role that can trigger environment update operations. In addition, the service account of the environment must have a role that has enough permissions to perform update operations.

  • If your environment is protected by a VPC Service Controls perimeter, then before installing PyPI dependencies you must grant additional user identities with access to services that the service perimeter protects and enable support for a private PyPI repository.

  • Requirements must follow the format specified in PEP-508 where each requirement is specified in lowercase and consists of the package name with optional extras and version specifiers.

  • PyPI dependency updates generate Docker images in Artifact Registry.

  • If a dependency conflict causes the update to fail, your environment continues running with its existing dependencies. If the operation succeeds, you can begin using the newly installed Python dependencies in your DAGs.

  • If you want your builds to run with a custom service account, you can override the COMPOSER_AGENT_BUILD_SERVICE_ACCOUNT environment variable with the chosen service account. This service account should be configured for running builds per Cloud Build documentation, and environment's service account should have the iam.serviceAccounts.actAs permission on it.

  • Projects where Cloud Composer API is enabled on April 29, 2024 or later. Unless your Organization overrides the constraints/cloudbuild.disableCreateDefaultServiceAccount policy, new projects won't provision the legacy Cloud Build Service Account on enabling the API. Because Cloud Build is used by default when installing custom PyPI packages in the Cloud Composer environment, packages installation might fail. By default, the environment's service account will be used instead, so make sure to grant any additional permissions required to access you private packages to that service account as well.

View the list of PyPI packages

You can get the list of packages for your environment in several formats.

View preinstalled packages

To view the list of preinstalled packages for your environment, see the list of packages for the Cloud Composer image of your environment.

View all packages

To view all packages (both preinstalled and custom) in your environment:

gcloud

The following gcloud CLI command returns the result of the python -m pip list command for an Airflow worker in your environment. You can use the --tree argument to get the result of the python -m pipdeptree --warn command.

gcloud beta composer environments list-packages \
    ENVIRONMENT_NAME \
    --location LOCATION

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.

View custom PyPI packages

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Go to the PyPI Packages tab.

gcloud

gcloud composer environments describe ENVIRONMENT_NAME \
  --location LOCATION \
  --format="value(config.softwareConfig.pypiPackages)"

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.

Install custom packages in a Cloud Composer environment

This section describes different methods for installing custom packages in your environment.

Install packages from PyPI

A package can be installed from Python Package Index if it has no external dependencies or conflicts with preinstalled packages.

To add, update, or delete the Python dependencies for your environment:

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Go to the PyPI packages tab.

  4. Click Edit

  5. Click Add package.

  6. In the PyPI packages section, specify package names, with optional version specifiers and extras.

    For example:

    • scikit-learn
    • scipy, >=0.13.3
    • nltk, [machine_learning]
  7. Click Save.

gcloud

gcloud CLI has several agruments for working with custom PyPI packages:

  • --update-pypi-packages-from-file replaces replaces all existing custom PyPI packages with the specified packages. Packages that you do not specify are removed.
  • --update-pypi-package updates or installs one package.
  • --remove-pypi-packages removes specified packages.
  • --clear-pypi-packages removes all packages.

Installing requirements from a file

The requirements.txt file must have each requirement specifier on a separate line.

For example:

scipy>=0.13.3
scikit-learn
nltk[machine_learning]

Update your environment, and specify the requirements.txt file in the --update-pypi-packages-from-file argument.

gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
     --update-pypi-packages-from-file requirements.txt

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.

Installing one package

Update your environment, and specify the package, version, and extras in the --update-pypi-package argument.

gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
     --update-pypi-package PACKAGE_NAMEEXTRAS_AND_VERSION

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • PACKAGE_NAME with the name of the package.
  • EXTRAS_AND_VERSION with the optional version and extras specifier. To omit versions and extras, specify an empty value.

Example:

gcloud composer environments update example-environment \
    --location us-central1 \
    --update-pypi-package "scipy>=0.13.3"

Removing packages

Update your environment, and specify the packages that you want to delete in the --remove-pypi-packages argument:

gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
     --remove-pypi-packages PACKAGE_NAMES

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • PACKAGE_NAMES with a comma-separated list of packages.

Example:

gcloud composer environments update example-environment \
    --location us-central1 \
    --remove-pypi-packages scipy,scikit-learn

API

Construct an environments.patch API request.

In this request:

  1. In the updateMask parameter, specify the mask:

    • Use config.softwareConfig.pypiPackages mask to replace all existing packages with the specified packages. Packages that you do not specify are deleted.
    • Use config.softwareConfig.envVariables.PACKAGE_NAME to add or update a specific package. To add or update several packages, specify several masks with commas.
  2. In the request body, specify packages and values for versions and extras:

    {
      "config": {
        "softwareConfig": {
          "pypiPackages": {
            "PACKAGE_NAME": "EXTRAS_AND_VERSION"
          }
        }
      }
    }
    

    Replace:

    • PACKAGE_NAME with the name of the package.
    • EXTRAS_AND_VERSION with the optional version and extras specifier. To omit versions and extras, specify an empty value.
    • To add more than one package, add extra entries for packages to pypiPackages.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.softwareConfig.pypiPackages.EXAMPLE_PACKAGE,
// config.softwareConfig.pypiPackages.ANOTHER_PACKAGE
{
  "config": {
    "softwareConfig": {
      "pypiPackages": {
        "EXAMPLE_PACKAGE": "",
        "ANOTHER_PACKAGE": ">=1.10.3"
      }
    }
  }
}

Terraform

The pypi_packages block in the software_config block specifies packages.

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    software_config {

      pypi_packages = {
          PACKAGE_NAME = "EXTRAS_AND_VERSION"
      }

    }
  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • PACKAGE_NAME with the name of the package.
  • EXTRAS_AND_VERSION with the optional version and extras specifier. To omit versions and extras, specify an empty value.
  • To add more than one package, add extra entries for packages to pypi_packages.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    software_config {
      pypi_packages = {
          scipy = ">=1.10.3"
          scikit-learn = ""
          nltk = "[machine_learning]"
      }
    }
  }
}

Install packages from a public repository

You can install packages hosted in other repositories that have a public IP address.

The packages must be properly configured, so that the default pip tool can install it.

To install from a package repository that has a public address:

  1. Create a pip.conf file and include the following information in the file, if applicable:

    • URL of the repository (in the index-url parameter)
    • Access credentials for the repository
    • Non-default pip installation options

    Example:

    [global]
    index-url=https://example.com/
    
  2. (Optional) In some cases, you might want to fetch packages from multiple repositories, such as when the public repository contains some specific packages that you want to install, and you want to install all other packages from PyPI:

    1. Configure an Artifact Registry virtual repository.
    2. Add configuration for multiple repositories (including PyPI, if needed) and define the order in which pip searches the repositories.
    3. Specify the virtual repository's URL in the index-url parameter.
  3. Determine the URI of your environment's bucket.

  4. Upload the pip.conf file to the /config/pip/ folder in your environment's bucket.

  5. Install packages using one of the available methods.

Install packages from an Artifact Registry repository

You can store packages in an Artifact Registry repository in your project, and configure your environment to install from it.

Configure roles and permissions:

  1. The service account of your environment must have the iam.serviceAccountUser role.

  2. Make sure that the Cloud Build service account has permissions to read from your Artifact Registry repository.

  3. If your environment has restricted access to other services in your project, for example, if you use VPC Service Controls:

    1. Assign permissions to access your Artifact Registry repository to the environment's service account instead of the Cloud Build service account.

    2. Make sure that connectivity to the Artifact Registry repository is configured in your project.

To install custom PyPI packages from an Artifact Registry repository:

  1. Create a pip.conf file and include the following information in the file, if applicable:

    • URL of the Artifact Registry repository (in the index-url parameter)
    • Access credentials for the repository
    • Non-default pip installation options

    For an Artifact Registry repository, append /simple/ to the repository URL:

    [global]
    index-url = https://us-central1-python.pkg.dev/example-project/example-repository/simple/
    
  2. (Optional) In some cases, you might want to fetch packages from multiple repositories, such as when your Artifact Registry repository contains some specific packages that you want to install, and you want to install all other packages from PyPI:

    1. Configure an Artifact Registry virtual repository.
    2. Add configuration for multiple repositories (including PyPI, if needed) and define the order in which pip searches the repositories.
    3. Specify the virtual repository's URL in the index-url parameter.
  3. Upload this pip.conf file to the /config/pip/ folder in your environment's bucket. For example: gs://us-central1-example-bucket/config/pip/pip.conf.

  4. Install packages using one of the available methods.

Install packages from a private repository

You can host a private repository in your project's network and configure your environment to install Python packages from it.

Configure roles and permissions:

  1. The service account for your Cloud Composer environment must have the iam.serviceAccountUser role.

  2. If you install custom PyPI packages from a repository in your project's network, and this repository does not have a public IP address:

    1. Assign permissions to access this repository to the environment's service account.

    2. Make sure that connectivity to this repository is configured in your project.

To install packages from a private repository hosted in your project's network:

  1. Create a pip.conf file and include the following information in the file, if applicable:

    • IP address of the repository in your project's network
    • Access credentials for the repository
    • Non-default pip installation options

    Example:

    [global]
    index-url=https://192.0.2.10/
    
  2. (Optional) In some cases, you might want to fetch packages from multiple repositories, such as when the private repository contains some specific packages that you want to install, and you want to install all other packages from PyPI:

    1. Configure an Artifact Registry virtual repository.
    2. Add configuration for multiple repositories (including PyPI, if needed) and define the order in which pip searches the repositories.
    3. Specify the virtual repository's URL in the index-url parameter.
  3. (Optional) In 2.2.1 and later versions of Cloud Composer, you can use a custom certificate when installing packages from your private repository. To do so:

    1. Upload the certificate file to the /config/pip/ folder in your environment's bucket.

    2. In pip.conf, specify the name of the certificate file in the cert parameter. Do not change the /etc/pip/ folder.

      Example:

      [global]
      cert =/etc/pip/example-certificate.pem
      
  4. Upload the pip.conf file to the /config/pip/ folder in your environment's bucket. For example: gs://us-central1-example-bucket/config/pip/pip.conf.

  5. Install packages using one of the available methods.

Install a local Python library

To install an in-house or local Python library:

  1. Place the dependencies within a subdirectory in the dags/ folder in your environment's bucket. To import a module from a subdirectory, each subdirectory in the module's path must contain an __init__.py package marker file.

    In the following example, the dependency is coin_module.py:

    dags/
      use_local_deps.py  # A DAG file.
      dependencies/
        __init__.py
        coin_module.py
    
  2. Import the dependency from the DAG definition file.

    For example:

from dependencies import coin_module

Use packages that depend on shared object libraries

Certain PyPI packages depend on system-level libraries. While Cloud Composer does not support system libraries, you can use the following options:

  • Use the KubernetesPodOperator. Set the Operator image to a custom build image. If you experience packages that fail during installation due to an unmet system dependency, use this option.

  • Upload the shared object libraries to your environment's bucket. If your PyPI packages have installed successfully but fail at runtime, use this option.

    1. Manually find the shared object libraries for the PyPI dependency (an .so file).
    2. Upload the shared object libraries to the /plugins folder in your environment's bucket.
    3. Set the following environment variable: LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/airflow/gcs/plugins

Install packages in private IP environments

This section explains how to install packages in private IP environments.

Depending on how you configure your project, your environment might not have access to the public internet.

Private IP environment with public internet access

If your private IP environment can access public internet, then you can install packages using options for public IP environments:

Private IP environment without internet access

If your private IP environment does not have access to public internet, then you can install packages using one of the following ways:

  • Use a private PyPI repository hosted in your project's network.
  • Use a proxy server VM in your project's network to connect to a PyPI repository on the public internet. Specify the proxy address in the /config/pip/pip.conf file in your environment's bucket.
  • Use an Artifact Registry repository as the only source of packages. To do so, redefine the index-url parameter, as described.
  • If your security policy permits access to external IP addresses from your VPC network, you can enable the installation of packages from repositories on the public internet by configuring Cloud NAT.
  • Put Python dependencies into the /dags folder in your environment's bucket to install them as local libraries. This might not be a good option if the dependency tree is large.

Install to a private IP environment under resource location restrictions

Keeping your project in line with Resource Location Restriction requirements prohibits the use of some tools. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.

To install Python dependencies in such an environment, follow the guidance for a private IP environments without internet access.

Install a Python dependency to a private IP environment in a VPC Service Controls perimeter

Protecting your project with a VPC Service Controls perimeter results in further security restrictions. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.

To install Python dependencies for a private IP environment inside a perimeter, follow the guidance for private IP environments without internet access.

What's next